Fix GPU Softmax NaN propagation for mixed infinite inputs #33498

AyusKumarPathak · 2026-01-07T19:12:18Z

🐞 Bug Fix: Correct NaN Propagation in GPU Softmax with Mixed Infinite Inputs

This PR fixes a numerical stability bug in the GPU implementation of the Softmax operation where inputs containing a mix of inf and finite values produce incorrect results.

🔬 Problem Summary

For an input such as:

[inf, 1.0, 2.0]

The mathematically correct Softmax (IEEE-754 compliant) behavior is:

However, the current GPU kernel produced:

[nan, nan, nan]
while the CPU implementation already behaves correctly.

The root cause is that when max_value == inf, the kernel evaluates:

exp(inf - inf) → NaN

which contaminates the denominator and propagates NaNs to all output positions.

🛠️ Solution

The GPU Softmax kernel now explicitly detects the max_value == inf case and applies IEEE-754 semantics:

Positions equal to max_value → NaN
All other positions → 0.0

This avoids unstable exponential evaluation and ensures GPU output exactly matches CPU behavior.

The fix is minimal, local to the kernel, and does not affect performance for normal inputs.

📈 Impact

Case	CPU	GPU (Before)	GPU (After)
`[inf, 1, 2]`	`[nan, 0, 0]`	`[nan, nan, nan]`	`[nan, 0, 0]`
`[inf, -inf, 1]`	`[nan, 0, 0]`	`[nan, nan, nan]`	`[nan, 0, 0]`
`[-inf, 1, 2]`	`[0, .2689, .7311]`	`[0, .2689, .7311]`	unchanged

🧩 Files Modified

src/plugins/intel_gpu/src/kernel_selector/cl_kernels/softmax_gpu_ref.cl

🧪 Testing

Verified against the reproducer from issue #33456.
GPU output now matches CPU output for all reported edge cases.

🔗 Related Issue

Fixes #33456

e-ddykim · 2026-01-08T07:22:38Z

src/plugins/intel_gpu/src/kernel_selector/cl_kernels/softmax_gpu_ref.cl

+// Handle IEEE-754 case when max_value is INF
+if (isinf((float)max_value)) {
+    for (cls = 0; cls < class_num; ++cls) {
+        ACCUMULATOR_TYPE v = data[cls * TMP_CLASS_PITCH];
+        if (v == max_value)
+            data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)NAN;
+        else
+            data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)0.0f;
+    }
+
+    // Write results and exit
+    for (cls = 0; cls < class_num; ++cls) {
+#if INPUT0_SIMPLE == 1
+        const uint output_idx = out_depth_offset + cls*OUTPUT_CLASS_PITCH;
+#else
+#if INPUT0_DIMS == 5
+        const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, z + *z_offset, y + *y_offset, x + *x_offset);
+#else
+        const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, y + *y_offset, x + *x_offset);
+#endif
+#endif
+        output[output_idx] = data[cls * TMP_CLASS_PITCH];
+    }
+    return;
+}


I think we need to apply fused ops for the issued case too.
Please check my suggestion below.

Suggested change

// Handle IEEE-754 case when max_value is INF

if (isinf((float)max_value)) {

for (cls = 0; cls < class_num; ++cls) {

ACCUMULATOR_TYPE v = data[cls * TMP_CLASS_PITCH];

if (v == max_value)

data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)NAN;

else

data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)0.0f;

}

// Write results and exit

for (cls = 0; cls < class_num; ++cls) {

#if INPUT0_SIMPLE == 1

const uint output_idx = out_depth_offset + cls*OUTPUT_CLASS_PITCH;

#else

#if INPUT0_DIMS == 5

const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, z + *z_offset, y + *y_offset, x + *x_offset);

#else

const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, y + *y_offset, x + *x_offset);

#endif

#endif

output[output_idx] = data[cls * TMP_CLASS_PITCH];

}

return;

}

for (cls = 0; cls < class_num; ++cls) {

// Handle IEEE-754 case when max_value is INF

if (isinf((float)max_value)) {

if (data[cls*TMP_CLASS_PITCH] == max_value)

data[cls*TMP_CLASS_PITCH] = TO_ACCUMULATOR_TYPE(NAN);

else

data[cls*TMP_CLASS_PITCH] = TO_ACCUMULATOR_TYPE(0.0f);

} else {

ACCUMULATOR_TYPE t = native_exp(data[cls*TMP_CLASS_PITCH] - max_value);

denominator += t;

data[cls*TMP_CLASS_PITCH] = t;

}

}

....

for (cls = 0; cls < class_num; ++cls) {

ACCUMULATOR_TYPE res = data[cls*TMP_CLASS_PITCH];

if (!isinf((float)max_value)) {

res = res / denominator;

}

AyusKumarPathak · 2026-01-08T08:18:17Z

Thanks for the suggestion but I’ve already reworked with the INF handling so it’s processed inside the main computation loops. This ensures fused ops and activation are now applied consistently for both INF and non-INF paths.

I also verified the original issue cases and the mixed-INF edge cases against the CPU implementation.
Please let me know if you’d like any additional scenarios validated.

e-ddykim

Additionally, could you please add unit tests for the issue case?

AyusKumarPathak · 2026-01-08T16:10:29Z

’ve added comprehensive GPU unit tests covering all reported IEEE-754 edge cases (mixed INF, multiple INF, negative INF, and NaN propagation).
All previously reported failures are now fully covered.
Thanks for the review looking forward to your approval.

p-durandin · 2026-01-09T05:52:34Z

build_jenkins

src/tests/functional/plugin/shared/include/single_op_tests/softmax.hpp

src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp

…max.cpp Co-authored-by: Pawel Raasz <pawel.raasz@intel.com>

src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp

AyusKumarPathak · 2026-01-17T06:43:42Z

applied suggestion kindly review

AyusKumarPathak · 2026-01-25T20:22:44Z

waiting for pr approval kindly review

praasz

Ok, generic test part.
@e-ddykim , @Lyamin-Roman could you review

Fix GPU Softmax NaN propagation for mixed infinite inputs

a44b14a

AyusKumarPathak requested review from a team as code owners January 7, 2026 19:12

github-actions bot added the category: GPU OpenVINO GPU plugin label Jan 7, 2026

sys-openvino-ci added the ExternalPR External contributor label Jan 7, 2026

AyusKumarPathak mentioned this pull request Jan 7, 2026

[Bug]: Softmax produces extra NaN with mixed inf input on GPU devices #33456

Open

3 tasks

rkazants requested a review from Lyamin-Roman January 8, 2026 05:25

e-ddykim reviewed Jan 8, 2026

View reviewed changes

AyusKumarPathak requested a review from e-ddykim January 8, 2026 15:41

Add comprehensive GPU Softmax edge-case tests for INF and NaN handling

eb3e17b

AyusKumarPathak requested review from a team as code owners January 8, 2026 16:06

github-actions bot added the category: IE Tests OpenVINO Test: plugins and common label Jan 8, 2026

p-durandin added this to the 2026.0 milestone Jan 9, 2026

praasz reviewed Jan 13, 2026

View reviewed changes

src/tests/functional/plugin/shared/include/single_op_tests/softmax.hpp Outdated Show resolved Hide resolved

Move GPU-specific Softmax edge-case tests into GPU plugin

ce97a90

AyusKumarPathak requested a review from praasz January 13, 2026 21:12

praasz reviewed Jan 16, 2026

View reviewed changes

praasz self-assigned this Jan 16, 2026

Address reviewer feedback on GPU softmax tests

6a912b6

AyusKumarPathak requested a review from praasz January 16, 2026 10:14

praasz reviewed Jan 16, 2026

View reviewed changes

src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp Outdated Show resolved Hide resolved

AyusKumarPathak and others added 2 commits January 16, 2026 19:47

Update src/plugins/intel_gpu/tests/functional/single_layer_tests/soft…

9c3ae64

…max.cpp Co-authored-by: Pawel Raasz <pawel.raasz@intel.com>

updated

f561033

AyusKumarPathak requested a review from praasz January 16, 2026 14:20

praasz reviewed Jan 16, 2026

View reviewed changes

src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp Show resolved Hide resolved

src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp Show resolved Hide resolved

AyusKumarPathak requested a review from praasz January 16, 2026 15:26

AyusKumarPathak added 2 commits January 17, 2026 00:42

Apply reviewer suggestions to GPU softmax tests

c46613a

Apply reviewer suggestions to GPU softmax tests

2e50813

praasz approved these changes Feb 4, 2026

View reviewed changes

praasz modified the milestones: 2026.0, 2026.1 Feb 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GPU Softmax NaN propagation for mixed infinite inputs #33498

Fix GPU Softmax NaN propagation for mixed infinite inputs #33498

AyusKumarPathak commented Jan 7, 2026

Uh oh!

e-ddykim Jan 8, 2026

Uh oh!

AyusKumarPathak commented Jan 8, 2026 •

edited

Loading

Uh oh!

e-ddykim left a comment

Uh oh!

AyusKumarPathak commented Jan 8, 2026

Uh oh!

p-durandin commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AyusKumarPathak commented Jan 17, 2026

Uh oh!

AyusKumarPathak commented Jan 25, 2026

Uh oh!

praasz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix GPU Softmax NaN propagation for mixed infinite inputs #33498

Are you sure you want to change the base?

Fix GPU Softmax NaN propagation for mixed infinite inputs #33498

Conversation

AyusKumarPathak commented Jan 7, 2026

🐞 Bug Fix: Correct NaN Propagation in GPU Softmax with Mixed Infinite Inputs

🔬 Problem Summary

🛠️ Solution

📈 Impact

🧩 Files Modified

🧪 Testing

🔗 Related Issue

Uh oh!

e-ddykim Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

AyusKumarPathak commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

e-ddykim left a comment

Choose a reason for hiding this comment

Uh oh!

AyusKumarPathak commented Jan 8, 2026

Uh oh!

p-durandin commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AyusKumarPathak commented Jan 17, 2026

Uh oh!

AyusKumarPathak commented Jan 25, 2026

Uh oh!

praasz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AyusKumarPathak commented Jan 8, 2026 •

edited

Loading